Addressing surprisal deficiencies in reading time models
نویسندگان
چکیده
This study demonstrates a weakness in how n-gram and PCFG surprisal are used to predict reading times in eye-tracking data. In particular, the information conveyed by words skipped during saccades is not usually included in the surprisal measures. This study shows that correcting the surprisal calculation improves n-gram surprisal and that upcoming n-grams affect reading times, replicating previous findings of how lexical frequencies affect reading times. In contrast, the predictivity of PCFG surprisal does not benefit from the surprisal correction despite the fact that lexical sequences skipped by saccades are processed by readers, as demonstrated by the corrected n-gram measure. These results raise questions about the formulation of information-theoretic measures of syntactic processing such as PCFG surprisal and entropy reduction when applied to reading times.
منابع مشابه
Lexical surprisal as a general predictor of reading time
Probabilistic accounts of language processing can be psychologically tested by comparing word-reading times (RT) to the conditional word probabilities estimated by language models. Using surprisal as a linking function, a significant correlation between unlexicalized surprisal and RT has been reported (e.g., Demberg and Keller, 2008), but success using lexicalized models has been limited. In th...
متن کاملEarly effects of word surprisal on pupil size during reading
This study investigated the relation between word surprisal and pupil dilation during reading. Participants’ eye movements and pupil size were recorded while they read single sentences. Surprisal values for each word in the sentence stimuli were estimated by both a recurrent neural network and a phrasestructure grammar. Higher surprisal corresponded to longer word-reading time, and this effect ...
متن کاملWord surprisal predicts N400 amplitude during reading
We investigated the effect of word surprisal on the EEG signal during sentence reading. On each word of 205 experimental sentences, surprisal was estimated by three types of language model: Markov models, probabilistic phrasestructure grammars, and recurrent neural networks. Four event-related potential components were extracted from the EEG of 24 readers of the same sentences. Surprisal estima...
متن کاملSurprisal-based comparison between a symbolic and a connectionist model of sentence processing
The ‘unlexicalized surprisal’ of a word in sentence context is defined as the negative logarithm of the probability of the word’s part-of-speech given the sequence of previous partsof-speech of the sentence. Unlexicalized surprisal is known to correlate with word reading time. Here, it is shown that this correlation grows stronger when surprisal values are estimated by a more accurate language ...
متن کاملPredictive power of word surprisal for reading times is a linear function of language model quality
Within human sentence processing, it is known that there are large effects of a word’s probability in context on how long it takes to read it. This relationship has been quantified using informationtheoretic surprisal, or the amount of new information conveyed by a word. Here, we compare surprisals derived from a collection of language models derived from n-grams, neural networks, and a combina...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016